Tools and Algorithms for Querying and Mining Large Graphs Thesis Proposal
نویسنده
چکیده
Graphs appear in a wide range of settings, such as computer networks, the world wide web, biological networks, social networks (MSN/FaceBook/LinkedIn) and many more. How can we find user-specific patterns (e.g., master mind, money laundry ring) from such graphs? How can we spot anomaly in a dynamic and intuitive way? How can we find the communities with optional constraints? How can we mine time/space in the complex context? In this thesis, we focus on two types of tasks according to the interaction with users: (1) querying and (2) mining. For the task of querying, we have mainly studied the following three sub tasks. First, we focus on how to find complex user-specific patterns from large graphs, where we have addressed three applications: (1) Center-piece subgraph discovery for plain graphs; (2) Best effort pattern match for attributed graphs; and (3) Querying with feedback. Then, we focus on (1) how to predict the direction of the link; and (2) how to address the temporal issue in querying. Finally, since the main tool for querying large graphs is the proximity measurement, we have designed a family of fast solutions (FastProx) in several different settings, which often achieves up to 2 orders of magnitude speedup without or with very little quality loss. For the task of mining, we have studied the following two sub tasks. First, we have proposed a family of example-based low-rank approximation methods (Colibri) for anomaly detection. It can work for both static graphs and dynamic graphs. It achieves significant speedups and space saving over the existing methods without quality loss. Then, we have designed (T3) to mine temporal information in the context of graphs, which can find similar time stamps as well as abnormal time stamps; and also provide the interpretations for our findings. Furthermore, we have proposed (MT3) to speedup the multiple resolution analysis. Future work includes two aspects. First, we would like to design tools to detect communities from graphs with optional constraints. Then, we will study how to mine the spatial information in the context of graphs. Besides, we plan to investigate diffusion wavelet as an alternative way for querying and mining large graphs.
منابع مشابه
WOOster: A Map-Reduce based Platform for Graph Mining
Large scale graphs containing O(billion) of vertices are becoming increasingly common in various applications. With graphs of such proportion, efficient querying infrastructure becomes crucial. In this paper, we propose WOOster a hosted querying infrastructure designed specifically for the large graphs. We make two key contributions: a) Design of the WOOster framework. b)Scalable map-reduce alg...
متن کاملMining Billion-Scale Graphs: Patterns and Algorithms
Graphs are everywhere: social networks, the World Wide Web, biological networks, and many more. The sizes of graphs are growing at unprecedented rate, spanning millions and billions of nodes and edges. What are the patterns in large graphs, spanning Giga, Tera, and heading toward Peta bytes? What are the best tools, and how can they help us solve graph mining problems? How do we scale up algori...
متن کاملFast Algorithms for Querying and Mining Large Graphs
Graphs appear in a wide range of settings and have posed a wealth of fascinating problems. In this thesis, we focus on two types of tasks according to the interaction with users: (1) querying (e.g., given a social network, how to measure the closeness between two persons? how to track it over time?) and (2) mining (e.g., how to identify abnormal behaviors of computer networks? In the case of vi...
متن کاملMining Approximate Frequent Patterns from Graph Databases
Graph analytics is the process of discovering patterns and insights from data that can be modeled as graphs. Algorithms for graph analytics fall into two broad categories : Mining and Management. Graph mining algorithms are often used in graph management and vice versa. In recent times, these algorithms have become an indispensable tool for analyzing networks in domains such as i) Computational...
متن کاملManaging Massive Graphs
Many real graphs conform today some of the largest data sets. Some of the best representatives of these graphs are the web graph, the interconnection network graph, the telephone call-graph, social networks, and query log graphs. Managing and finding relevant information on large graphs are challenging problems in current research. The need to deal with massive graphs has increased the interest...
متن کامل